Geometric Dirichlet Means Algorithm for topic inference

نویسندگان

  • Mikhail Yurochkin
  • XuanLong Nguyen
چکیده

We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA’s likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Mean Field for Dirichlet-Based Models

Variational inference is an important class of approximate inference techniques that has been applied to many graphical models, including topic models. We propose to improve the efficiency of mean field inference for Dirichlet-based models by introducing an approximative framework that converts weighted geometric means in the updates into weighted arithmetic means. This paper also discusses a c...

متن کامل

A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling

This technical report provides a tutorial on the theoretical details of probabilistic topic modeling and gives practical steps on implementing topic models such as Latent Dirichlet Allocation (LDA) through the Markov Chain Monte Carlo approximate inference algorithm Gibbs Sampling.

متن کامل

Neural Variational Inference For Topic Models

Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address this problem is neural variational inference (NVI), but they have proven difficult to apply to topic models in practice. We present what is to our knowledge t...

متن کامل

Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm

We present an effort to perform topic mixture-based language model adaptation using latent Dirichlet allocation (LDA). We use probabilistic latent semantic analysis (PLSA) to automatically cluster a heterogeneous training corpus, and train an LDAmodel using the resultant topicdocument assignments. Using this LDA model, we then construct topic-specific corpora at the utterance level for interpol...

متن کامل

Autoencoding Variational Inference for Topic Models

Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address this problem is autoencoding variational Bayes (AEVB), but it has proven difficult to apply to topic models in practice. We present what is to our knowledge t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016